-
Notifications
You must be signed in to change notification settings - Fork 659
[Feature] Guided Decoding add LLguidance backend #5124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
| and self.fd_config.structured_outputs_config.guided_decoding_backend is not None | ||
| and self.fd_config.structured_outputs_config.guided_decoding_backend == "guidance" | ||
| ) | ||
| if not ErnieArchitectures.contains_ernie_arch(architectures) or is_guidance_backend: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在一言所有模型的词表都可以使用 AutoTokenizer 加载吗? 之前好像都会有问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4.5的 22B 可以,0.3B 会挂掉。
不走这个 FastTokenizer 逻辑就会无法使用,尬住
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5124 +/- ##
==========================================
Coverage ? 60.02%
==========================================
Files ? 319
Lines ? 39010
Branches ? 5883
==========================================
Hits ? 23414
Misses ? 13750
Partials ? 1846
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for llguidance as a new backend for constrained decoding (structured generation) in FastDeploy, enabling grammar-based constraints during token generation. This provides an alternative to the existing XGrammar backend.
- Added llguidance backend implementation with processor, backend, and checker classes
- Integrated llguidance into the configuration system with validation
- Added comprehensive unit tests with mocking support for environments without llguidance
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 69 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/model_executor/guided_decoding/guidance_backend.py | Core implementation of LLGuidance backend, processor, and checker classes |
| fastdeploy/lazy_loader.py | New utility for lazy-loading modules to avoid pulling in heavy dependencies |
| fastdeploy/model_executor/guided_decoding/init.py | Factory integration for llguidance backend and checker |
| fastdeploy/model_executor/guided_decoding/base_guided_decoding.py | Added conditional logic to use HF tokenizer for guidance backend |
| fastdeploy/config.py | Configuration validation and import check for llguidance backend |
| fastdeploy/envs.py | Added environment variables for llguidance configuration |
| requirements_guided_decoding.txt | Added llguidance, torch dependencies |
| tests/model_executor/guided_decoding/test_guidance_*.py | Comprehensive unit tests with mocking support |
| docs/**/parameters.md | Updated parameter documentation to include guidance backend |
| docs/**/structured_outputs.md | Added llguidance backend to feature documentation |
| fastdeploy/model_executor/guided_decoding/xgrammar_backend.py | Removed max_rollback_tokens parameter |
Comments suppressed due to low confidence (8)
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Comments in lines 22 and 36 are in Chinese (Simplified), but the codebase uses English for comments. According to the custom guidelines, Chinese should only be used for repository members. Since this is a public code contribution, these comments should be translated to English for consistency with the rest of the codebase.
"""
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Comments in lines 22 and 36 are in Chinese (Simplified), but the codebase uses English for comments. According to the custom guidelines, Chinese should only be used for repository members. Since this is a public code contribution, these comments should be translated to English for consistency with the rest of the codebase.
"""
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""
tests/model_executor/guided_decoding/test_guidance_checker.py:1
- Numerous docstrings and comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments and documentation should be in English for consistency with the rest of the codebase. These should be translated.
"""
tests/model_executor/guided_decoding/test_guidance_backend.py:1
- Multiple inline comments throughout the test file are written in Chinese (Simplified). According to the custom guidelines for external contributors, all comments should be in English for consistency. These should be translated.
"""
fastdeploy/model_executor/guided_decoding/base_guided_decoding.py:152
- Overridden method signature does not match call, where it is passed too many arguments. Overriding method method LLGuidanceBackend._create_processor matches the call.
def _create_processor(self):
| try: | ||
| self.ll_tokenizer = llguidance_hf.from_tokenizer(self.hf_tokenizer, self.vocab_size) | ||
| except Exception as e: | ||
| import traceback |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The traceback module is already imported at the top of the file (line 19). This duplicate import inside the exception handler is redundant and should be removed.
| import traceback |
Motivation
This PR adds support for llguidance as a new backend for constrained decoding (structured generation).
Modifications
Dependency Integration: Added llguidance integration .txt
Backend Implementation: Implemented the wrapper/adapter for llguidance to interface with the inference engine.
Config Update: Added configuration options to select llguidance as the constrained decoding provider
Usage or Command
structured_outputs.md
Accuracy Tests
Validity Test: Verified that the generated output strictly adheres to the provided JSON Schema and Regex patterns.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.